Acoustics-based baseform generation with pronunciation and/or phonotactic models
نویسندگان
چکیده
In this paper, we describe a method to derive a phonetic pronunciation of a word using only an acoustic utterance of that word without a priori knowledge of the spelling of the word. In [5] and [6], we used a pronunciation model based on bigram statistics. Bi-gram statistics only constrain the left neighbor phone and results in phone sequences that are only pairwise appropriate. Here, we apply a pronunciation model in combination with a phonotactic model that serves the purpose of a language model to constrain the phone sequences produced. Error rates with and without the phonotactic model are presented.
منابع مشابه
Improved Pronunciation Modeling by Properly Integrating Better Approaches for Baseform Generation, Ranking and Pruning
In this paper, a complete framework for pronunciation modeling process is discussed and analyzed as the integration of three individual but mutual-interactive stages, i.e., the baseform generation, baseform ranking, and baseform pruning stage. The characteristics of different techniques used in each stage and the interaction among them are then well reflected on the overall performance of pronu...
متن کاملOn the Adequacy of Baseform Pronunciations and Pronunciation Variants
This paper presents an approach to automatically extract and evaluate the “stability” of pronunciation variants (i.e., adequacy of the model to accommodate this variability), based on multiple pronunciations of each lexicon words and the knowledge of a reference baseform pronunciation. Most approaches toward modelling pronunciation variability in speech recognition are based on the inference (t...
متن کاملModeling Cantonese pronunciation variation by acoustic model refinement
Pronunciation variations can be roughly classified into two types: a phone change or a sound change [1][2]. A phone change happens when a canonical phone is produced as a different phone. Such a change can be modeled by converting the baseform (standard) phone to a surfaceform (actual) phone. A sound change happens at a lower, phonetic or subphonetic level within a phone and it cannot be modele...
متن کاملConfidence Measures for Evaluating Pronunciation Models
In this paper, we investigate the use of confidence measures for the evaluation of pronunciation models and the employment of these evaluations in an automatic baseform learning process. The confidence measures and pronunciation models are obtained from the ABBOT hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) Large Vocabulary Continuous Speech Recognition (LVCSR) system [8]. Exp...
متن کاملPronunciation ambiguity vs. pronunciation variability in speech recognition
It is widely acknowledged that pronunciations in spontaneous speech di er signi cantly from citation form. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention however has focussed on describing an alternate pronunciation as a di erent sequence of phonetic units using the same inventory of phones whi...
متن کامل